Skip to content

Various CPU Speedups#244

Draft
GNiendorf wants to merge 1 commit intomasterfrom
cpu_speedups_clean
Draft

Various CPU Speedups#244
GNiendorf wants to merge 1 commit intomasterfrom
cpu_speedups_clean

Conversation

@GNiendorf
Copy link
Member

@GNiendorf GNiendorf commented Mar 9, 2026

just testing various random improvements/optimizations that seem to help CPU timing on lnx4555. Still very much a work in progress, code needs to be cleaned up and refined a bit.

This PR Timing (CPU)
Screenshot 2026-03-11 at 11 23 16 PM

Master Timing (CPU)
Screenshot 2026-03-09 at 11 07 11 PM

@GNiendorf GNiendorf force-pushed the cpu_speedups_clean branch from 27baa73 to bf988bc Compare March 9, 2026 23:21
@SegmentLinking SegmentLinking deleted a comment from github-actions bot Mar 9, 2026
@GNiendorf GNiendorf changed the title [Test] Various CPU Speedups Various CPU Speedups Mar 10, 2026
@GNiendorf GNiendorf force-pushed the cpu_speedups_clean branch from 90516e1 to c85af30 Compare March 10, 2026 13:23
@SegmentLinking SegmentLinking deleted a comment from github-actions bot Mar 10, 2026
@SegmentLinking SegmentLinking deleted a comment from github-actions bot Mar 10, 2026
@SegmentLinking SegmentLinking deleted a comment from github-actions bot Mar 10, 2026
@GNiendorf GNiendorf force-pushed the cpu_speedups_clean branch from 39dca9c to 51f681b Compare March 10, 2026 16:01
@SegmentLinking SegmentLinking deleted a comment from github-actions bot Mar 10, 2026
@SegmentLinking SegmentLinking deleted a comment from github-actions bot Mar 10, 2026
@GNiendorf
Copy link
Member Author

run-ci: all

@slava77
Copy link

slava77 commented Mar 10, 2026

@GNiendorf I like the progress here very much so far.

@slava77
Copy link

slava77 commented Mar 10, 2026

please check the µ cube sample.

@GNiendorf GNiendorf force-pushed the cpu_speedups_clean branch 2 times, most recently from b70bd1c to be79ee2 Compare March 10, 2026 18:54
@GNiendorf
Copy link
Member Author

run-ci: all

@github-actions
Copy link

The PR was built and ran successfully in standalone mode running on CPU. Here are some of the comparison plots.

Efficiency vs pT comparison Efficiency vs eta comparison
Fake rate vs pT comparison Fake rate vs eta comparison
Duplicate rate vs pT comparison Duplicate rate vs eta comparison

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     27.8    325.0    244.9    133.2     49.6    685.1     10.9    114.7    116.6    187.5      0.1    1895.3    1182.4+/- 281.2     583.8   explicit[s=4] (target branch)
   avg     27.4     97.3     84.9     94.2     43.4    124.2     10.6     33.8     60.3     51.0      0.1     627.3     475.6+/- 125.4     281.2   explicit[s=4] (this PR)

@SegmentLinking SegmentLinking deleted a comment from github-actions bot Mar 10, 2026
@SegmentLinking SegmentLinking deleted a comment from github-actions bot Mar 10, 2026
@github-actions
Copy link

The PR was built and ran successfully with CMSSW running on CPU. Here are some plots.

OOTB All Tracks
Efficiency and fake rate vs pT, eta, and phi

The full set of validation and comparison plots can be found here.

@GNiendorf GNiendorf force-pushed the cpu_speedups_clean branch from be79ee2 to 27b2ca2 Compare March 11, 2026 22:24
@GNiendorf
Copy link
Member Author

GNiendorf commented Mar 12, 2026

please check the µ cube sample.

Just checked and I there are no differences in the plots on the cube 50 sample. I will start going through this PR more carefully now and double check things before I open it for review, may take a while.

@GNiendorf
Copy link
Member Author

Manos suggested to break this PR up, and I think that makes sense. Going to open the first one shortly.

alpaka::math::asin(
acc, alpaka::math::min(acc, sdOut_dr * k2Rinv1GeVf / alpaka::math::abs(acc, pt_beta), kSinAlphaMax)),
betaOut);
acc, alpaka::math::min(acc, sdOut_dr * k2Rinv1GeVf / alpaka::math::abs(acc, pt_beta), kSinAlphaMax), betaOut);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please collect the non-technical changes, like these approximations in a separate collection of commits (or a separate PR).
For these asin replacements, perhaps it's better to ifdef or alias a function in the Common.h, e.g. approxAsin to then select the first order or the full approx at compile time.

Numerical reformulations (like the fastDeltaPhi) should be OK in the ~technical category.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To elaborate on Gavin's message and explain it, my suggestion to him was exactly to factor out the technical changes from the ones that had an effect on the physics performance (I notice some performance changes on the validation plots), and to test (and potentially have) a change that may negatively affect the GPU timing also separately (at least on a separate commit).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants